{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "pip install pandas==0.23.0\n",
    "pip install numpy==1.14.3\n",
    "pip install matplotlib==3.0.3\n",
    "pip install seaborn==0.8.1\n",
    "pip install PyAthena==1.8.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install mxnet==1.5.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "\n",
    "from scipy.sparse import lil_matrix\n",
    "\n",
    "import boto3\n",
    "import botocore\n",
    "import sagemaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = boto3.session.Session()\n",
    "region_name = session.region_name\n",
    "\n",
    "sagemaker_session = sagemaker.Session()\n",
    "bucket = sagemaker_session.default_bucket()\n",
    "\n",
    "print(bucket)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extracting parameters from FM model\n",
    "\n",
    "Now that we have the model created and stored in SageMaker, we can download the same and extract the parameters.  The FM model is stored in MxNet format.\n",
    "\n",
    "This section is reproduced with minor modifications from the blog cited above for the sake of completeness."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download model data\n",
    "\n",
    "Skip the next cell block if you have already downloaded the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mxnet as mx\n",
    "import os\n",
    "\n",
    "model_file_name = \"model.tar.gz\"\n",
    "model_full_path = fm.output_path + \"/\" + fm.latest_training_job.job_name + \"/output/\" + model_file_name\n",
    "print(\"Model Path: \", model_full_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Download FM model \n",
    "\n",
    "!rm -rf ./model\n",
    "!mkdir -p ./model/\n",
    "!aws s3 cp $model_full_path ./model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%bash \n",
    "# TODO:  Fix this\n",
    "\n",
    "#Extract model file for loading to MXNet\n",
    "echo $model_full_path\n",
    "cd ./model/\n",
    "ls -al\n",
    "tar xzvf model.tar.gz\n",
    "unzip -o model_algo-1\n",
    "mv symbol.json model-symbol.json\n",
    "mv params model-0000.params\n",
    "\n",
    "ls -al"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extract model data to create item and user latent matrixes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('num_customers: {}'.format(num_customers))\n",
    "print('num_products: {}'.format(num_products))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mxnet as mx\n",
    "#Extract model data\n",
    "m = mx.module.Module.load('./model/model', 0, False, label_names=['out_label'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "V = m._arg_params['v'].asnumpy()\n",
    "w = m._arg_params['w1_weight'].asnumpy()\n",
    "b = m._arg_params['w0_weight'].asnumpy()\n",
    "\n",
    "# user latent matrix - concat (V[u], 1) \n",
    "ones = np.ones(num_customers).reshape((num_customers, 1))\n",
    "knn_user_matrix = np.concatenate((V[:num_customers], ones), axis=1)\n",
    "print('knn_user_matrix.shape')\n",
    "print(knn_user_matrix.shape)\n",
    "\n",
    "# item latent matrix - concat(V[i], w[i]). \n",
    "# Note:  The +1 is not part of the original example\n",
    "knn_item_matrix = np.concatenate((V[num_customers + 1:], w[num_customers + 1:]), axis=1)\n",
    "print('knn_item_matrix.shape')\n",
    "print(knn_item_matrix.shape)\n",
    "\n",
    "knn_train_label = np.arange(1, num_products + 1)\n",
    "print('knn_train_label')\n",
    "print(knn_train_label.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate Influence Matrix\n",
    "\n",
    "Per the paper cited above, the influence matrix for user $j$ is calculated as:\n",
    "\n",
    "$$J_j=U^T(U W_j U^T)^{-1}UW_j$$\n",
    "\n",
    "Let's map those symbols to the variables in this notebook.\n",
    "\n",
    "* $U$ is the embedding matrix for items.  In this formula, it is the transpose of the item matrix we extracted from the FM model.  So $U={knn\\_item\\_matrix}^{T}$\n",
    "* $U^T={knn\\_item\\_matrix}$\n",
    "* $W$ is a binary matrix with 1s on the diagonal in positions corresponding the known entries of X for this user.  In other words, it's a matrix of size $nb\\_movies$ by $nb\\_movies$, with a one on the diagonal in row and column $i$ where user $j$ rated movie $i$.\n",
    "\n",
    "Now let's confirm that our dimensions line up properly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "knn_item_matrix.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "knn_user_matrix.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build the matrix $W$.\n",
    "\n",
    "For the sake of an example, let's pick user `846`, just because that user was the first row in our training set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(num_products)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "W = np.zeros([num_products, num_products])\n",
    "W.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_customer_ids = df[df.star_rating > 3].customer_id\n",
    "print(test_customer_ids[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Find a `customer_id`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_of_interest = test_customer_ids[0]\n",
    "\n",
    "u1 = df[df.customer_id == user_of_interest]\n",
    "u2 = df[df.customer_id == user_of_interest]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "u1.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "u2.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "u_all = np.concatenate((np.array(u1['product_id']), np.array(u2['product_id'])), axis=0)\n",
    "u_all"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#print(product_index)\n",
    "print(product_index[product_index.product_id == 'B00BWDH368'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for u_rating in u_all:\n",
    "    # Convert the user_of_interest <=> index using customer_index\n",
    "    # Subtract the num_customers since the indexes are combined customers + products (perhaps we reset_index() above)\n",
    "    u_rating_idx = product_index[product_index.product_id == u_rating].item - num_customers\n",
    "    W[u_rating_idx, u_rating_idx] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Calculate $J$ for user $j$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# influence matrix = u_tr * (u*w*u_tr)-1 * u * w\n",
    "J1 = np.matmul(np.transpose(knn_item_matrix), W) # u*w\n",
    "J2 = np.matmul(J1, knn_item_matrix) # u*w*u_tr\n",
    "J3 = np.linalg.inv(J2) # (u*w*u_tr)-1\n",
    "J4 = np.matmul(knn_item_matrix, J3) # u_tr * (u*w*u_tr)-1\n",
    "J5 = np.matmul(J4, np.transpose(knn_item_matrix)) # u_tr * (u*w*u_tr)-1 * u\n",
    "J = np.matmul(J5, W) # # u_tr * (u*w*u_tr)-1 * u * w"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "J.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explaining recommendations for a user\n",
    "\n",
    "Now we can use the influence matrix to calculate the two metrics explained in the research paper:\n",
    "\n",
    "_Influence_ of the actual rating that user $j$ assigned to item $k$ on the predicted rating for item $i$.  This is calculated as:\n",
    "\n",
    "$${\\beta}_k = J_{ik}^j$$\n",
    "\n",
    "In other words, we just look up the element at row $i$ and column $k$ of the influence matrix $J$ for user $j$\n",
    "\n",
    "_Impact_ of the actual rating that user $j$ assigned to item $k$ on the predicted rating for item $i$.  This is calculated as:\n",
    "\n",
    "$${\\gamma}_k = {\\beta}_{k}x_{kj}$$\n",
    "\n",
    "In other words, we multiply the influence by the actual rating that user $j$ gave to item $k$\n",
    "\n",
    "In this example I'll just use influence, since we converted the ratings to a binary like/don't like.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Look up influence for a test recommendation\n",
    "\n",
    "For our selected user, let's find a movie in our test set that they rated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "u2.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "movie_to_rate = 60"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = fm_predictor.predict(X_test[8451:8452].toarray()) # use the row number from the test set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "influence_i = J[movie_to_rate-1,:] # movies are indexed at 1, so we offset to 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "influence_i[movie_to_rate-1] = 0.0 # zero this out; it's the influence of the movie itself"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# join with movie names\n",
    "df_movies = pd.read_csv('ml-100k/u.item', sep='|', header=None, names=['movie_id', 'movie_name', 'c3','c4','c5','c6','c7',\n",
    "                                                                      'c9','c9','c10','c11','c12','c13','c14','c15','c16','c17',\n",
    "                                                                      'c18','c19','c20','c21','c22','c23','c24'])\n",
    "df_movies.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_influence = pd.DataFrame(data={'influence': influence_i, 'movie': df_movies['movie_name']})\n",
    "df_influence.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This movie is 'Three Colors:Blue', a French drama that probably appeals to 'art house' movie goers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_movies[df_movies['movie_id'] == movie_to_rate]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And what do we recommend?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_top_influence = df_influence.nlargest(20, 'influence')\n",
    "df_top_influence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "plt.style.use('ggplot')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax = df_top_influence.plot(x ='movie', y='influence', kind = 'barh', figsize=(20,20), title='Top 20 Influences', color='blue')\n",
    "ax.set_ylabel(\"Movie\")\n",
    "ax.set_xlabel(\"Influence\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.sort(u_all)[:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "movie_to_rate = 9"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rate_data = np.zeros((1, num_features))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rate_data[0, user_of_interest-1] = 1.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rate_data[0, nb_users + movie_to_rate -1] = 1.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = fm_predictor.predict(rate_data) \n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "influence_i = J[movie_to_rate-1,:] # movies are indexed at 1, so we offset to 0\n",
    "influence_i[movie_to_rate-1] = 0.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_influence = pd.DataFrame(data={'influence': influence_i, 'movie': df_movies['movie_name']})\n",
    "df_influence.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're looking at the movie 'Dead Man Walking', which was an acclaimed movie about a prisoner on Death Row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_movies[df_movies['movie_id'] == movie_to_rate]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_top_influence = df_influence.nlargest(20, 'influence')\n",
    "df_top_influence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax = df_top_influence.plot(x ='movie', y='influence', kind = 'barh', figsize=(20,20), title='Top 20 Influences', color='blue')\n",
    "ax.set_ylabel(\"Movie\")\n",
    "ax.set_xlabel(\"Influence\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Are these results intuitively satisfying?  I'm not quite sure, but remember that built this model with a relatively limited data set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean-up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fm_predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
